30 research outputs found
Evaluation of Text Document Clustering Using K-Means
The fundamentals of human communication are language and written texts. Social media is an essential source of data on the Internet, but email and text messages are also considered to be one of the main sources of textual data. The processing and analysis of text data is conducted using text mining methods. Text Mining is the extension of Data Mining to text files to extract relevant information from large amounts of text data and to recognize patterns. Cluster analysis is one of the most important text mining methods. Its goal is the automatic partitioning of a number of objects into a finite set of homogeneous groups (clusters). The objects should be as similar as possible within a group. Objects from different groups, however, should have different characteristics. The starting-point of cluster analysis is a precise definition of the task and the selection of representative data objects. A challenge regarding text documents is their unstructured form, which requires extensive pre-processing. For the automated processing of natural language Natural Language Processing (NLP) is used. The conversion of text files into a numerical form can be performed using the Bag-of-Words (BoW) approach or neural networks. Each data object can finally be represented as a point in a finite-dimensional space, where the dimension corresponds to the number of unique tokens, here words. Prior to the actual cluster analysis, a measure must also be defined to determine the similarity or dissimilarity between the objects. To measure dissimilarity, metrics such as Euclidean distance, for example, are used. Then clustering methods are applied. The cluster methods can be divided into different categories. On the one hand,there are methods that form a hierarchical system, which are also called hierarchical cluster methods. On the other hand, there are techniques that provide a division into groups by determining a grouping on the basis of an optimal homogeneity measure, whereby the number of groups is predetermined. The procedures of this class are called partitioning methods. An important representative is the k-Means method which is used in this thesis. The results are finally evaluated and interpreted. In this thesis, the different methods used in the individual cluster analysis steps are introduced. In order to make a statement about which method seems to be the most suitable for clustering documents, a practical investigation was carried out on the basis of three different data sets
Data Science Meets Nuclear - What Data Analytics, Computational Intelligence and Machine Learning Can Contribute to Nuclear Waste Management and Nuclear Verification
Data science is multidisciplinary field that deals with the study of all aspects of data right from its generation to processing to converting it into valuable knowledge source. While data science has a wide range of applications, to what extent have new data science methods made their way into research related to nuclear waste management and nuclear verification? And which further research questions in these fields would particularly benefit from the use of new data science methods? In this line, this paper has two objectives: First, to highlight the state-of-the-art of data science in nuclear waste management and nuclear verification. Second, to discuss the potential use of data science. Ideas for data science in nuclear waste management include, e.g., i) facilitating integration, analytics and visualization of data in the comparative selection process for a geological repository site, ii) creating a virtual geological repository system, iii) geological repository monitoring over its life cycle phases. In nuclear verification, data science can make a significant contribution to i) unattended monitoring by using, e.g., seals/tags, surveillance (optical, 2D/3D laser, gamma, etc.), radiation measurements, etc.; ii) perimeter monitoring through surveillance (optical, gamma, thermal, etc., radiation measurements, etc.), and iii) wide area monitoring using satellite imagery, geophysical monitoring, environmental sampling, etc
Recommended from our members
NCI Comparative Oncology Program Testing of Non-Camptothecin Indenoisoquinoline Topoisomerase I Inhibitors in Naturally Occurring Canine Lymphoma
PurposeOnly one chemical class of topoisomerase I (TOP1) inhibitors is FDA approved, the camptothecins with irinotecan and topotecan widely used. Because of their limitations (chemical instability, drug efflux-mediated resistance, and diarrhea), novel TOP1 inhibitors are warranted. Indenoisoquinoline non-camptothecin topoisomerase I (TOP1) inhibitors overcome chemical instability and drug resistance that limit camptothecin use. Three indenoisoquinolines, LMP400 (indotecan), LMP776 (indimitecan), and LMP744, were examined in a phase I study for lymphoma-bearing dogs to evaluate differential efficacy, pharmacodynamics, toxicology, and pharmacokinetics.Experimental designEighty-four client-owned dogs with lymphomas were enrolled in dose-escalation cohorts for each indenoisoquinoline, with an expansion phase for LMP744. Efficacy, tolerability, pharmacokinetics, and target engagement were determined.ResultsThe MTDs were 17.5 mg/m2 for LMP 776 and 100 mg/m2 for LMP744; bone marrow toxicity was dose-limiting; up to 65 mg/m2 LMP400 was well-tolerated and MTD was not reached. None of the drugs induced notable diarrhea. Sustained tumor accumulation was observed for LMP744; γH2AX induction was demonstrated in tumors 2 and 6 hours after treatment; a decrease in TOP1 protein was observed in most lymphoma samples across all compounds and dose levels, which is consistent with the fact that tumor response was also observed at low doses LMP744. Objective responses were documented for all indenoisoquinolines; efficacy (13/19 dogs) was greatest for LMP744.ConclusionsThese results demonstrate proof-of-mechanism for indenoisoquinoline TOP1 inhibitors supporting their further clinical development. They also highlight the value of the NCI Comparative Oncology Program (https://ccr.cancer.gov/Comparative-Oncology-Program) for evaluating novel therapies in immunocompetent pets with cancers